Double-Center-Based Iris Localization and Segmentation in Cooperative Environment with Visible Illumination

Iris recognition has been considered as one of the most accurate and reliable biometric technologies, and it is widely used in security applications. Iris segmentation and iris localization, as important preprocessing tasks for iris biometrics, jointly determine the valid iris part of the input eye image; however, iris images that have been captured in user non-cooperative and visible illumination environments often suffer from adverse noise (e.g., light reflection, blurring, and glasses occlusion), which challenges many existing segmentation-based parameter-fitting localization methods. To address this problem, we propose a novel double-center-based end-to-end iris localization and segmentation network. Different from many previous iris localization methods, which use massive post-process methods (e.g., integro-differential operator-based or circular Hough transforms-based) on iris or contour mask to fit the inner and outer circles, our method directly predicts the inner and outer circles of the iris on the feature map. In our method, an anchor-free center-based double-circle iris-localization network and an iris mask segmentation module are designed to directly detect the circle boundary of the pupil and iris, and segment the iris region in an end-to-end framework. To facilitate efficient training, we propose a concentric sampling strategy according to the center distribution of the inner and outer iris circles. Extensive experiments on the four challenging iris data sets show that our method achieves excellent iris-localization performance; in particular, it achieves 84.02% box IoU and 89.15% mask IoU on NICE-II. On the three sub-datasets of MICHE, our method achieves 74.06% average box IoU, surpassing the existing methods by 4.64%.


Introduction
Iris recognition is one of the most reliable biometric technologies, and is widely applied in intelligence unlocking, border control, and forensics, among others [1][2][3][4]. A complete irisrecognition system usually consists of image acquisition, iris segmentation and localization, normalization, feature extraction, and matching. Figure 1 illustrates the key steps for iris-image preprocessing, which contains location, segmentation, and normalization. Iris localization aims to detect the inner and outer boundaries of an iris region; iris segmentation generates an iris mask to distinguish iris and non-iris pixels. The normalization process allows the alignment of any two iris images to be compared. After image preprocessing, we can achieve the normalized iris texture image and iris/noise mask image; therefore, an important part of the iris recognition pre-processing, iris segmentation and localization jointly define the region used for feature extraction and matching, directly affecting the overall iris-recognition performance.
Most previous studies have focused on iris images in cooperative environments (e.g., near-infrared illumination, high user collaboration, close acquisition distance, and stop-gaze verification), and two widely used localization methods are Daugman's integro-differential operator [5] and Wilde's circular Hough transforms [6]. Recently, it has been extended to non-cooperative environments (e.g., long distances, limited user cooperation, visible lighting, and mobile devices) because it requires minimally restrictive user cooperation Most previous studies have focused on iris images in cooperative environments (e.g., near-infrared illumination, high user collaboration, close acquisition distance, and stopgaze verification), and two widely used localization methods are Daugman's integro-differential operator [5] and Wilde's circular Hough transforms [6]. Recently, it has been extended to non-cooperative environments (e.g., long distances, limited user cooperation, visible lighting, and mobile devices) because it requires minimally restrictive user cooperation and imaging conditions; however, iris images captured in non-cooperative environments often have various noise, such as gaze bias, iris rotation, aniridia reflection, specular reflection, motion/defocus blur, and eyelid/lash/hair/glasses occlusion (see Figure 2), thereby making iris localization challenging. Recent deep learning-based methods use convolution neural networks (CNNs) to separate the iris in the image and use circular Hough transform [6] thereafter to fit the inner-and outer-circle parameters on the iris mask. These methods have achieved good results in iris mask segmentation, but localization is extremely rough and accuracy remains below the standard required for iris recognition. Other CNN-based methods, such as IrisParseNet [7] and NIR-Zhang [8], use the semantic segmentation method to predict inner-/outer-circle boundary masks, and utilize some post-processing thereafter to obtain parameterized inner and outer circles. Although localization is improved, it requires massive refined post-processing, and the accuracy of the inner-and outer-circle localizations is easily influenced by a fuzzy contour boundary.
We use the preceding issues as bases in considering the following three aspects: (1) how toachieve a more accurate and robust localization method on iris images captured in a non-cooperative environment with visible illumination? (2) performing post-processing (e.g., integro-differential operator-based or circular Hough transforms-based method) on pupil or iris mask to fit the iris' inner and outer circle parameters , , is timeconsuming and not robust, can the model directly output the localization results of the  Most previous studies have focused on iris images in cooperative environments (e.g., near-infrared illumination, high user collaboration, close acquisition distance, and stopgaze verification), and two widely used localization methods are Daugman's integro-differential operator [5] and Wilde's circular Hough transforms [6]. Recently, it has been extended to non-cooperative environments (e.g., long distances, limited user cooperation, visible lighting, and mobile devices) because it requires minimally restrictive user cooperation and imaging conditions; however, iris images captured in non-cooperative environments often have various noise, such as gaze bias, iris rotation, aniridia reflection, specular reflection, motion/defocus blur, and eyelid/lash/hair/glasses occlusion (see Figure 2), thereby making iris localization challenging. Recent deep learning-based methods use convolution neural networks (CNNs) to separate the iris in the image and use circular Hough transform [6] thereafter to fit the inner-and outer-circle parameters on the iris mask. These methods have achieved good results in iris mask segmentation, but localization is extremely rough and accuracy remains below the standard required for iris recognition. Other CNN-based methods, such as IrisParseNet [7] and NIR-Zhang [8], use the semantic segmentation method to predict inner-/outer-circle boundary masks, and utilize some post-processing thereafter to obtain parameterized inner and outer circles. Although localization is improved, it requires massive refined post-processing, and the accuracy of the inner-and outer-circle localizations is easily influenced by a fuzzy contour boundary.
We use the preceding issues as bases in considering the following three aspects: (1) how toachieve a more accurate and robust localization method on iris images captured in a non-cooperative environment with visible illumination? (2) performing post-processing (e.g., integro-differential operator-based or circular Hough transforms-based method) on pupil or iris mask to fit the iris' inner and outer circle parameters , , is timeconsuming and not robust, can the model directly output the localization results of the Recent deep learning-based methods use convolution neural networks (CNNs) to separate the iris in the image and use circular Hough transform [6] thereafter to fit the innerand outer-circle parameters on the iris mask. These methods have achieved good results in iris mask segmentation, but localization is extremely rough and accuracy remains below the standard required for iris recognition. Other CNN-based methods, such as IrisParseNet [7] and NIR-Zhang [8], use the semantic segmentation method to predict inner-/outer-circle boundary masks, and utilize some post-processing thereafter to obtain parameterized inner and outer circles. Although localization is improved, it requires massive refined postprocessing, and the accuracy of the inner-and outer-circle localizations is easily influenced by a fuzzy contour boundary.
We use the preceding issues as bases in considering the following three aspects: (1) how toachieve a more accurate and robust localization method on iris images captured in a non-cooperative environment with visible illumination? (2) performing post-processing (e.g., integro-differential operator-based or circular Hough transforms-based method) on pupil or iris mask to fit the iris' inner and outer circle parameters (x, y, radius) is timeconsuming and not robust, can the model directly output the localization results of the inner and outer circles without any post-processing? (3) On the basis of (2), can we obtain a simple but effective end-to-end model that integrates localization and segmentation in one network? For the above three aspects, and considering the natural biological characteristics of the iris region, we propose a double-center-based method to localize and segment the iris region. Our main contributions are summarized as follows: (1) We propose to locate the iris's inner and outer circles as center points and regression radius on feature maps directly, thereby solving the problems of prediction inaccuracies and lacking robustness of iris localization on the visible iris image. Compared with existing methods that use massive post-processing on the predicted mask to obtain the circle parameters, our approach is post-processing-free and the localization module output is the final inner-/outer-circle localization result; (2) By explicitly analyzing the distribution of the center points of the pupil and iris, we propose a novel auxiliary sample strategy to accelerate model training on nonstandard iris images. These images have irrelevant face regions, such as chin, nose, and environments; (3) We design an end-to-end dedicated iris localization and segmentation framework, which is simple, effective, and achieves excellent performance in multiple benchmarks. The proposed method provides a good foundation for iris localization and segmentation in a non-cooperative environment.
The remainder of this paper is organized as follows: Section 2 introduces the related work on iris segmentation and localization. Section 3 presents details pertaining to the proposed iris localization and segmentation methods. Section 4 presents ablation, and compares and analyzes the experimental results. Section 5 summarizes the research and concludes the study.

Related Research
Most traditional iris-segmentation methods mainly use a circle or ellipse to locate the inner and outer boundaries of the iris, and then use the difference of the gray histogram to exclude any superimposed occlusions of eyelashes, shadows, glasses, or reflections, and infer the iris region [3].
For iris localization, two widely used baseline methods are Daugman's integrodifferential operator [5] and Wildes's Hough transforms [6]. The integro-differential operator searched for the largest difference of intensity over the circle parameter space, and it has achieved great precision; however, it may take a relatively longer time [9]. The circular Hough transforms found optimal circle parameters by a voting procedure in a binary edge image, and is often applied to detect boundaries of the circle or ellipse, but this transform is relatively non-sensitive to the broken contours of 2D objects in the binary image [10].
Based on the above two basic methods, many later proposed approaches made further improvements in accuracy and efficiency. For example, ref. [11] applied L 1 norm to suppress noisy texture before performing iris localization, ref. [12] applied region clustering before localization for narrowing the parameter search range, ref. [13] proposed integro-differential constellation to reduce the computation time, and ref. [14], applied the Viterbi algorithm on gradient maps of iris images to find coarse low-resolution contours. Those traditional methods rely considerably on prior assumptions and image low-level features, so they have low accuracy and poor robustness in dealing with noise.
Recently, deep learning-based the semantic segmentation method has achieved higher accuracy in iris segmentation. Different from the traditional pixel-based iris-segmentation method, the deep learning-based iris-segmentation methods utilize high-semantic features and estimate iris masks end to end. The first CNNs-based iris-segmentation method is HCNNs [15]. Other iris-segmentation methods based on CNNs include those in [16][17][18][19][20][21]. These methods use convolution to extract semantic features and segment the foreground of the iris, thereby achieving good iris mask segmentation.
Although CNNs-based iris-segmentation method has achieved good results, most of those methods only predict iris masks, and more important iris localization is not achieved. Ref. [15] first applied the circular Hough transforms method from original iris image to iris masks to generate candidate circular iris boundaries, and then use two quality measures to select the best inner and outer iris boundaries.
Current CNN-based methods first segment the image to achieve the iris mask and utilize a massive post-process, such as Daugman's integro-differential operator and Hough transform, to fit the inner and outer circles on the iris mask. As shown in Figure 3a, similar methods are also presented in [22,23]. These methods work well on iris images taken in a cooperative environment (e.g., near-infrared illumination, high user collaboration, close acquisition distance, and stop-gaze verification), but in iris images taken in non-cooperative and visible light environments, the performance was mediocre. IrisParsetNet [7] first uses the segmentation method to predict the inner and outer boundaries of the iris. While the model predicts the iris mask, it also simultaneously predicts the pupil and iris outer boundary masks, as shown in Figure 3b. NIR-Zhang [8] uses two completely independent models responsible for iris segmentation and iris localization, as shown in Figure 3c. The localization model predicts pupil and iris masks and uses post-processing thereafter to fit the contour and circle parameters.; however, the massive post-process to fit a circle on the boundary mask is time-consuming and not robust on the iris images captured in non-cooperative environments with visible illumination. The reason is that the localization results are easily affected by irregular contour edges.
achieved. Ref. [15] first applied the circular Hough transforms method from original iris image to iris masks to generate candidate circular iris boundaries, and then use two quality measures to select the best inner and outer iris boundaries.
Current CNN-based methods first segment the image to achieve the iris mask and utilize a massive post-process, such as Daugman's integro-differential operator and Hough transform, to fit the inner and outer circles on the iris mask. As shown in Figure  3a, similar methods are also presented in [22,23]. These methods work well on iris images taken in a cooperative environment (e.g., near-infrared illumination, high user collaboration, close acquisition distance, and stop-gaze verification), but in iris images taken in noncooperative and visible light environments, the performance was mediocre. IrisParsetNet [7] first uses the segmentation method to predict the inner and outer boundaries of the iris. While the model predicts the iris mask, it also simultaneously predicts the pupil and iris outer boundary masks, as shown in Figure 3b. NIR-Zhang [8] uses two completely independent models responsible for iris segmentation and iris localization, as shown in Figure 3c. The localization model predicts pupil and iris masks and uses post-processing thereafter to fit the contour and circle parameters.; however, the massive post-process to fit a circle on the boundary mask is time-consuming and not robust on the iris images captured in non-cooperative environments with visible illumination. The reason is that the localization results are easily affected by irregular contour edges. To obtain an efficient post-processing-free iris-localization solution and achieve a more accurate and robust iris localization on iris images captured in a non-cooperative environment, we propose a center-based iris localization and segmentation method and make localization rely on a feature map other than the iris mask (see Figure 3d). Inspired by the use of heat-map for face key point detection [24], we regard the inner/outer circles as two objects to locate. We first locate the double center and regress the radius thereafter based on the center. A segmentation branch is embedded after the location module, which is responsible for segmenting the iris and background pixels to obtain the iris mask. Our method is fast, accurate, and does not need any post-processing. The model's output is inner/outer circle , , , ∈ , and iris mask directly.

Methods
The architecture of the proposed ICSNet, as shown in Figure 4, mainly includes the localization and segmentation modules. The localization module consists of two subnets, namely, classification and radius regression heads, which are responsible for predicting the inner-and outer-circular bounding boxes. The segmentation module consists of RoI crop and the mask head, which are accountable for segmenting the iris mask over the To obtain an efficient post-processing-free iris-localization solution and achieve a more accurate and robust iris localization on iris images captured in a non-cooperative environment, we propose a center-based iris localization and segmentation method and make localization rely on a feature map other than the iris mask (see Figure 3d). Inspired by the use of heat-map for face key point detection [24], we regard the inner/outer circles as two objects to locate. We first locate the double center and regress the radius thereafter based on the center. A segmentation branch is embedded after the location module, which is responsible for segmenting the iris and background pixels to obtain the iris mask. Our method is fast, accurate, and does not need any post-processing. The model's output is inner/outer circle (x k , y k , r k ), k ∈ [inner, outer] and iris mask directly.

Methods
The architecture of the proposed ICSNet, as shown in Figure 4, mainly includes the localization and segmentation modules. The localization module consists of two subnets, namely, classification and radius regression heads, which are responsible for predicting the inner-and outer-circular bounding boxes. The segmentation module consists of RoI crop and the mask head, which are accountable for segmenting the iris mask over the detected region. As an auxiliary structure, the proposed concentric sample strategy only exists during the training stage and is cost-free in inference time.
First, this section introduces the localization module, which is our model's essential iris-localization component. Second, we present the mask-segmentation branch, which follows the localization module. Lastly, we introduce the sample strategy. detected region. As an auxiliary structure, the proposed concentric sample strategy only exists during the training stage and is cost-free in inference time.
First, this section introduces the localization module, which is our model's essential iris-localization component. Second, we present the mask-segmentation branch, which follows the localization module. Lastly, we introduce the sample strategy. Figure 4. Architecture of ICSNet. Subnets in the three blue boxes are the center localization head, radius regression head, and mask head. Subnets in the orange box are the concentric sample operations, red arrow is the data flow, and our concentric sampling strategy is completely cost-free in inference time because auxiliary structures only exist during training. The localization module includes the classification and regression heads, which predict the circle bounding box. The mask module includes crop operation and mask head, which are responsible for cropping the iris region from the feature map and producing the segment iris mask.

Double-Center Localization
Current iris localization uses iris or contour mask to fit the inner/outer circles, and these fitting methods can be divided into integro-differential operator-based and circular Hough transforms-based methods; however, owing to the noise in a non-cooperative environment with visible light, these methods are susceptible to irregular mask or contour edges, resulting in poor robustness and inaccuracy. To address these issues, we propose a double-center-based iris-localization method, and the model relies on a feature map to directly predict the iris's inner and outer circles. .
Given an input image ∈ * * , we adopt the Gaussian kernel as in CornerNet [25] to produce a ground truth heat map , , ∈ 0,1 * * , where , is the location and is the number of center point categories. In our experiment, we set 2 (inner or outer circle). The heat map and input have the same size because our backbone is an encoder-decoder fully convolution network. A prediction , , 1 corresponds to a detected center point, while , , 0 is background.
For each ground truth center point , , we compute an equivalent ̅ ⌊ ⌋; thereafter, we splat center points ̅ , ̅ onto a heat map using a Gaussian kernel, which

Double-Center Localization
Current iris localization uses iris or contour mask to fit the inner/outer circles, and these fitting methods can be divided into integro-differential operator-based and circular Hough transforms-based methods; however, owing to the noise in a non-cooperative environment with visible light, these methods are susceptible to irregular mask or contour edges, resulting in poor robustness and inaccuracy. To address these issues, we propose a double-center-based iris-localization method, and the model relies on a feature map to directly predict the iris's inner and outer circles.
Given an input image I ∈ R W * H * 3 , we adopt the Gaussian kernel as in CornerNet [25] to produce a ground truth heat map Y x,y,c ∈ [0, 1] W * H * C , where (x, y) is the location and C is the number of center point categories. In our experiment, we set C = 2 (inner or outer circle). The heat map and input have the same size because our backbone is an encoder-decoder fully convolution network. A predictionŶ x,y,c = 1 corresponds to a detected center point, whileŶ x,y,c = 0 is background.
For each ground truth center point p x , p y , we compute an equivalent p = p ; thereafter, we splat center points p x , p y onto a heat map using a Gaussian kernel, which is the same as [26]. The peak of the Gaussian distribution is treated as a positive sample, while another pixel is treated as a negative sample. We adopt modified focal loss [27] to train. The localization module follows behind the backbone. As shown in Figure 4, the module consists of three convolution layers and sigmoid function. The localization module predicts two heat maps, with all value range in [0,1], and the location of the maximum of one value in each heat map is the predicted center point (x k , y k ), k ∈ [inner, outer].

Radius Regression
Owing to the unique physiological characteristics of the iris region, the centers of the inner and outer circles appear in a pair; therefore, we set up two branches to be responsible for the inner-and outer-circle radius regressions.
Let p k = (x k , y k ) be the circle box center points, where k represents the center point category (inner or outer circle). We utilize heat mapŶ to predict center point p k , and regress circle radius r k thereafter for each center point p k . We adopt an L 1 loss to train, which is defined as follows: Thereafter, we directly use raw pixel coordinates for regression. To make the regression more accurate, we regard the samples in the Gaussian distribution region as positive samples, which are all responsible for predicting the object radius. Samples outside the Gaussian region are considered negative samples, and these samples do not need to predict the radius.
Additionally, we predict an offset for the center point to recover the discretization error caused by the integer operation and output stride. The offset is training with an L 1 loss. We take the pixel in the center of the object to predict the offset value directly, which is defined as follows: In our experiments, we set λ size = 1 and λ o f f = 0.1, following [28].

Segmentation Module
After the localization module process, we obtain the inner and outer center (x c ,ŷ c ) and its radiusr c . According to the detected center point of the outer circle, we crop thê r ×r RoI region on the feature map as the mask head input. We select only one iris RoI region according to the center point of the outer circle with confidence, thereby avoiding time-consuming post-processing.
The mask head comprises three convolution layers and sigmoid function. Each convolution layer consists of one 3 × 3 convolution layer, in which the stride is 2 and padding is 1, ReLU activation, and batch normalization (see Figure 4). After the mask branch process, we obtain the mask of the iris in the RoI region. The mask branch adopts the cross-entropy loss function to train.

Total Loss
Total loss L is composed of localization loss L cls , regression loss L reg , and mask loss L mask , and weighted by three scalars, defined as follows: where L cls is based on the modified focal loss [28], L reg is calculated according to Equation (2), and L mask adopts the cross-entropy loss. In our experiment, we set w loc , w reg , and w mask as 1, 1, and 1, respectively.

Backbone
The backbone network of the model is an encoder-decoder full convolution network, which can accept any input size; moreover, the backbone is based on deep layer aggregation (DLA) [29]. In the encoder, we use a basic structure block in DLA. In the decoder, we use the up-sampling of bilinear interpolation and a single convolution layer, consisting of one 3 × 3 convolution layer with stride 2, ReLU activation, and batch normalization. The depth of the decoder and encoder is 5. We use a shortcut to connect the encoder block and decoder layer. The output feature map has 32 channels.

Concentric Sampling Strategy
Given the natural biological characteristics of the iris region, the centers of the inner and outer circles would overlap. Hindered by the camera device and non-cooperative environment, two centers in the captured image occasionally have a slight shift, as shown in Figure 5a,b, and annotated circles; moreover, the Gaussian heat map has overlapping regionsas shown in Figure 5c; therefore, we define a concentric Gaussian region (CGR) as the inner and outer circles sharing the same Gaussian value, which is represented as follows: where Y i,j,c presents the original Gaussian distribution, in which the calculation method is the same as [30], c presents the category (inner or outer), and α is the hyper-parameter, presenting the weight of the inner and outer Gaussian distribution in the overlapping region; thereafter, the value is clipped to the range [0,0.5]. Locations of the inner/outer center points and their four directions are high activation points; hence, we set these values in the heat map as 1.0 and 0.8, respectively.
which can accept any input size; moreover, the backbone is based on deep layer aggregation (DLA) [29]. In the encoder, we use a basic structure block in DLA. In the decoder, we use the up-sampling of bilinear interpolation and a single convolution layer, consisting of one 3 × 3 convolution layer with stride 2, ReLU activation, and batch normalization. The depth of the decoder and encoder is 5. We use a shortcut to connect the encoder block and decoder layer. The output feature map has 32 channels.

Concentric Sampling Strategy
Given the natural biological characteristics of the iris region, the centers of the inner and outer circles would overlap. Hindered by the camera device and non-cooperative environment, two centers in the captured image occasionally have a slight shift, as shown in Figure 5a,b, and annotated circles; moreover, the Gaussian heat map has overlapping regionsas shown in Figure 5c; therefore, we define a concentric Gaussian region (CGR) as the inner and outer circles sharing the same Gaussian value, which is represented as follows: where , , presents the original Gaussian distribution, in which the calculation method is the same as [30], presents the category (inner or outer), and is the hyper-parameter, presenting the weight of the inner and outer Gaussian distribution in the overlapping region; thereafter, the value is clipped to the range [0,0.5]. Locations of the inner/outer center points and their four directions are high activation points; hence, we set these values in the heat map as 1.0 and 0.8, respectively. For the Gaussian distribution, we set the pixel at the circular box center region as the positive sample, while another pixel is the negative sample. We limit the inner Gaussian distribution and outer Gaussian distribution to the inner-circle region, avoiding the imbalance of loss contribution caused by the Gaussian distribution area and making the model focus immediately on important regions. The concentric sampling strategy is completely cost-free in inference time, as the auxiliary structures only exist during the training stage.

Double Center to Circular Boxes
During the inference, the model outputs 2 heat maps , where the values indicate the confidence scores of the inner and outer centers. The confidence score is sorted, and the coordinate position corresponding to the maximum value is the predicted target For the Gaussian distribution, we set the pixel at the circular box center region as the positive sample, while another pixel is the negative sample. We limit the inner Gaussian distribution and outer Gaussian distribution to the inner-circle region, avoiding the imbalance of loss contribution caused by the Gaussian distribution area and making the model focus immediately on important regions. The concentric sampling strategy is completely cost-free in inference time, as the auxiliary structures only exist during the training stage.

Double Center to Circular Boxes
During the inference, the model outputs 2 heat mapsŶ H×W×1 , where the values indicate the confidence scores of the inner and outer centers. The confidence score is sorted, and the coordinate position corresponding to the maximum value is the predicted target center point (x k , y k ), k ∈ [inner, outer]. On the basis of the center point, the corresponding radius and offset can be directly derived.
For the center point position (x, y) and predicted radiusr and offset δ x , δ y , the corresponding circular bounding box coordinates are calculated as follows: where s is the scale ratio of the feature map and input size. Owing to the particularity of the backbone network, in the experiment, we set s = 1; therefore, the predicted radius is the size of the original pixel of the image, and (x,ŷ,r) is the final predicted result.

Experimental Environment
We use Adam optimizer, and the model weight is initialized randomly. The initial learning rate is 0.001, weight decay is set as 0.0004, and mini-batch is 6. During training, the learning rate is automatically adjusted, and the total epoch is 120.
In the inference phase, we resize the input image to a specific resolution and forward it thereafter to directly obtain the predicted circle bounding boxes (inner and outer) and the iris mask.

Data Set
NICE-II [31] comprises two non-overlapping subsets: (i) a training set with 1000 images from 171 subjects and (ii) a testing set with 1000 images from 150 subjects. We use all training sets to train and all testing sets to test.
MICHE [32] is composed of three sub-databases: GS4, IP5, and GT2. Images are captured in an uncontrolled environment using three mobile devices. MICHE-GS4 has 1297 images (663 and 634 indoor and outdoor images, respectively), IP5 has 1262 images (631 and 631 indoor and outdoor images, respectively), and MICHE-GT2 has 632 images (316 each for indoor and outdoor images). In the experiment, we use all indoor images for training and all outdoor images for testing for each sub-data set.
For the above four datasets, NICEII, MICHE-GT2, MICHE-GS4, and MICHE-IP5, the picture resolution (width × height) is 300 × 400, 400 × 300, 270 × 444, and 270 × 444, respectively. Since our network is a full convolutional encoder-decoder network, the image resolution that can be entered needs to be a multiple of 32; therefore, the input size of each dataset is 320 × 448, 448 × 320, 320 × 448, and 320 × 448, respectively. In addition to expanding the width and height by central cropping, there is no other data processing.
The images of these databases are captured from user non-cooperative and visible light environments, and taken by mobile devices rather than imaging sensors (such as infrared iris cameras); therefore, these images usually contain light reflection, gaze deviation, defocusing, mirror reflection, and other noises. Some examples of eye images and ground truths are shown in Figure 6. We aim to achieve more accurate localization of iris inner and outer boundaries, as well as eliminate these noises to obtain valid iris pixels.

Evaluation Protocols
We adopt several evaluation protocols for the inner-/outer-circle localization and iris segmentation to evaluate the proposed method.
(1). Localization. We compute the inner-/outer-circle box IoU mIoU , which ranges from [0,1]. The closer the value to 1, the better the localization. We also compute the Hausdorff distance, similar to [7]. We add points coordinate normalization, and the range of normalization Hausdorff distances is between [0,1]. The smaller the value, Figure 6. Example images and corresponding ground truths (including iris inner boundary (green), iris outer boundary (red), and iris mask (white)) of four iris databases.

Evaluation Protocols
We adopt several evaluation protocols for the inner-/outer-circle localization and iris segmentation to evaluate the proposed method. (1). Localization. We compute the inner-/outer-circle box IoU mIoU box , which ranges from [0,1]. The closer the value to 1, the better the localization. We also compute the Hausdorff distance, similar to [7]. We add points coordinate normalization, and the range of normalization Hausdorff distances is between [0,1]. The smaller the value, the higher the shape similarity; (2). Segmentation. We use E1 mask and E1 norm , which are the mask errors of the iris and normalized iris masks, respectively, which are the same as [7]. The value range is [0,1]. The smaller the value, the better the result. In addition, we use mIOU mask to evaluate the segmentation performance, and the value range is [0,1]. The larger the value, the better the segmentation result.

Ablation Study
We conduct an ablation study to demonstrate the influence of the concentric sample. Results on the NICE-II [31] and MICHE [32] data sets are shown in Table 1. In the MICHE database, the images have irrelevant face regions, such as chin, nose, and environment, as shown in the middle column of Figure 6. Note that using this auxiliary training strategy brings a relatively significant performance gain of nearly 4.79%; moreover, the results show that making inner-and outer-circle Gaussian distributions share the same Gaussian value on the overlapping region can cause the model to focus on the critical region. Sampling over the central region is suitable for iris localization on non-standard iris images. In the NICE-II database, the images are in the eye region, as shown in the first column of Figure 6. Note that using CGR degrades the model performance. The low similarity in distribution between the NICE-II and MICHE data sets is the main reason. Using CGR on the NICE-II data set introduces many positive samples, which distract the model attention.
After canceling CGR, the model only needs to focus on two positive samples (center point of the inner and outer circles), making the model more accurate. Table 1. Ablation study of concentric sampling. "↑" denotes the improvement of localization performance after using concentric sampling strategy to auxiliary training.

Compared with Other Methods and Discussion
To verify the effectiveness of our model, we compare proposed method ICSNet with other state-of-the-art methods, FCEDN [18], IrisDenseNet [19], FRED-Net [20], IrisParseNet [7], and NIR-Zhang [8]. Tables 2 and 3 provide summaries of the performance comparison of the proposed approach with baseline method on iris localization and iris segmentation using the proposed evaluation. We also report the parameter amount and FLOPs in order to further evaluate the proposed approach.
For localization, FCEDN [18], IrisDenseNet [19], and FRED-Net [20] use an integrodifferential operator or circular Hough transform to fit the inner and outer circles on the predicted iris mask. IrisParseNet [7] predicts the iris mask and two iris edge masks, and then adopts the circle fitting method on the edge mask to obtain the circular inner and outer iris boundaries. NIR-Zhang [8] use an independent model to predict the inner contour and the outer contour, and then use a circular Hough transform to fit the inner circle and the outer circle. As described in Section 3.1, our method relies on a feature map to predict the inner and outer circles directly. It can be observed that the IrisParseNet [7] and NIR-Zhang [8] consistently outperforms the other three methods in four databases, in particular, three noisy visible MICHE databases, which show that fitting the circle on the contour mask is better than fitting it on the iris mask; moreover, our approach achieved better localization results on all databases than the above five methods. On the NICE-II, ICSNet achieves 84.02% box mIoU, outperforming other methods by nearly 1.02%; on the MICHE, ICSNet outperforms other methods by nearest 9.28% avg box mIoU, which all reflect that predicting the iris circle on the feature map is more accurate and robust.
The parameters and FLOPs for different methods (resolution: 320 × 320 pixels) are shown in Table 4. Note that our model only needs 47.01 GFLOPs. All experiment results show that our model is well balanced in terms of accuracy and speed. To better compare the iris localization and segmentation results, we select certain challenging samples and present the normalization results. The visualization results of ICSNet and other methods on the different databases are shown in Figure 7. The innerand outer-circular boundaries are marked by green and red, respectively. If locating the inner and outer circles fails, then the normalized iris result will be empty. Evidently, for the irregular and noisy iris images formed under visible light illumination in a noncooperative environment, other mask-based fitting methods cannot achieve accurate and robust iris localization; however, our double-center-based localization method can handle this situation. This result indicates that the localization method of dependent feature maps is better than the fitting method of relying on iris/contour masks. The results of the different methods on NICE-II [31] and MICHE [32]. For non-normalized and normalized masks, blue indicates true positive pixels, green represents false positive pixels, red represents false negative pixels. If locating the inner and outer circles fails, then the normalized iris result will be empty, which is indicated by a black bar image.
The existing methods use considerable post-processing to fit the inner-and outercircular parameters on the iris or contour mask. These methods rely substantially on the segmented iris mask instead of the image feature, have weak robustness, and low localization in a non-cooperative environment. Given the effects of illumination, reflection, and noise, the shape of the segmented iris mask is irregular. Our double-center-based localization method relies on feature map rather than the segmented iris or contour mask, and the model directly predicts the inner-and outer-circle positions; therefore, our model has good localization performance and strong robustness on iris images in a non-cooperative environment with visible illumination.

Conclusions
In this paper, we propose a novel double-center-based efficient iris localization and segmentation network. Unlike prior research utilizing post-process methods (e.g., circle Hough transform or integro-differential) on iris or contour mask to fit the inner and outer circles, ICSNet relies on feature map to directly predict the center and regress the radius; furthermore, we propose a novel auxiliary sample strategy to accelerate model training on non-standard iris images captured in a non-cooperative environment with visible illumination. The proposed approach is compared with other methods and evaluated using four representative iris image databases. Experiments show that our proposed method  [31] and MICHE [32]. For non-normalized and normalized masks, blue indicates true positive pixels, green represents false positive pixels, red represents false negative pixels. If locating the inner and outer circles fails, then the normalized iris result will be empty, which is indicated by a black bar image.
The existing methods use considerable post-processing to fit the inner-and outercircular parameters on the iris or contour mask. These methods rely substantially on the segmented iris mask instead of the image feature, have weak robustness, and low localization in a non-cooperative environment. Given the effects of illumination, reflection, and noise, the shape of the segmented iris mask is irregular. Our double-center-based localization method relies on feature map rather than the segmented iris or contour mask, and the model directly predicts the inner-and outer-circle positions; therefore, our model has good localization performance and strong robustness on iris images in a non-cooperative environment with visible illumination.

Conclusions
In this paper, we propose a novel double-center-based efficient iris localization and segmentation network. Unlike prior research utilizing post-process methods (e.g., circle Hough transform or integro-differential) on iris or contour mask to fit the inner and outer circles, ICSNet relies on feature map to directly predict the center and regress the radius; furthermore, we propose a novel auxiliary sample strategy to accelerate model training on non-standard iris images captured in a non-cooperative environment with visible illumination. The proposed approach is compared with other methods and evaluated using four representative iris image databases. Experiments show that our proposed method has excellent performance on iris localization.
Although the proposed method has many advantages above, it also has some defects. The serial network design of first localization and then segmentation increases the time for the model to process those two tasks; moreover, the method of optimizing radius regression with L 1 loss does not take into account the pairing of the inner and outer radius of the iris. Finally, although the model has achieved optimization on most indicators, the segmentation effect on some data sets still needs to be improved.
In the future, we will explore more efficient approaches to improving segmentation performance on irregular iris image captured under visible illumination, and consider optimizing the radius regression loss function; moreover, we will consider optimizing the network structure to deal with iris localization and iris segmentation tasks in parallel to reduce the total process time. Funding: This work is partially supported by grants from the Natural Science Foundation of Chongqing, China (grant no. CSTB2022NSCQ-MSX0493), and is the key project of Chongqing Technology Innovation and Application Development (grant no. cstc2021jscx-dxwtBX0018).